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Abstract 

Using differential equations, we examine the GREEDY algorithm studied by Azar, Broder, Karlin, and 
Upfal for distributed load balancing [1]. This approach yields accurate estimates of the actual load distribu- 
tion, provides insight into the exponential improvement GREEDY offers over simple random selection, and 
allows one to prove tight concentration theorems about the loads in a straightforward manner. 

1 Introduction 

Suppose that n balls are placed into n bins, with each ball being placed into a bin chosen independently and 
uniformly at random. Let the load of a bin be the number of balls in that bin after all balls have been thrown. 
It is well known that with high probability, the maximum load upon completion will be approximately lo °f" gn 
[8] . (We use with high probability to mean with probability at least 1 — O ( 1 /n) , where n is the number of balls. 
Also, log will always mean the natural logarithm, unless otherwise noted.) 

Azar, Broder, Karlin, and Upfal considered how much more evenly distributed the load would be if each ball 
had two (or more) choices [1]. Suppose that the balls are placed sequentially, and each ball is placed into the 
less full of two bins chosen independently and uniformly at random with replacement (breaking ties arbitrarily). 
In this case, they showed that the maximum load drops to J^lgp + 0(1) with high probability. If each ball 
instead has d choices, then the maximum load will be lo ^ lo |" +0(1) with high probability. Having two choices 
hence yields a qualitatively different type of behavior from the single choice case, leading to an exponential 
improvement in the maximum load; having more than two choices further improves the maximum load by only 
a constant factor. This result has important implications for distributed load balancing, hashing, and PRAM 
simulation [1]. 

Following Azar et al., we refer to the algorithm in which each ball has d random choices as GREEDY (J). 
In this paper, we develop an alternative method of studying the performance of GREEDY (J) using differential 
equations. The differential equations describe the limiting performance of of GREEDY (d) as the number of balls 
and bins grows to infinity. As we will demonstrate, the description of the limiting performance proves highly 

*A preliminary version of this work appeared in the Proceedings of the 37th Annual Symposium on the Foundations of Computer 
Science, October 1996. 

tMuch of this work was done at U.C. Berkeley, supported by a fellowship from the Office of Naval Research and by NSF grant 
CCR-9505448. 
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accurate, even when n is relatively small. Besides giving perhaps more insight into the actual performance of 
GREEDY, our methods provides a great deal of intuition behind the proof of the behavioral difference between 
one and two choices. 

Our motivation in studying this problem is twofold. First, we wish to demonstrate and highlight this methodol- 
ogy, and encourage its use for studying random processes. While this methodology is by no means new, its uses 
have been surprisingly limited. The technical results justifying the relationship between families of Markov 
processes and differential equations date back at least to Kurtz [13, 14]. Karp and Sipser provided and early use 
of this technology to study an algorithm for finding maximum matchings in sparse random graphs [11]. Other 
past applications in the analysis of algorithms include [9, 12], and more recently many more have been found 
(see, for example, [2, 3, 15, 17, 21] to name a few). 

Our second motivation is to demonstrate the power of using two choices. This idea dates back at least as far as 
the work of Eager, Lazowska, and Zahorjan [7], who examined the problem inadynamic load balancing setting 
based on viewing processors as single server queues. In the static setting, this idea was also studied by Hajek 
[9], who used the same approach we undertake to determine the fraction of empty bins. The aforementioned 
exponential improvement in behavior was noted and proven first in a paper by Karp, Luby, and Meyer auf der 
Heide [10]. The work by Azar et. al. examined a simpler model that clarified the argument as well as provided 
many new results. Related work by the author [16, 17], as well as by others [20], examines the power of two 
choices in dynamic settings. Continued work in the area includes recent work by Stemann [19] and Czumaj 
and Stemann [6]. 

In the rest of the paper, we explain the derivation of the differential equations that describe the GREEDY strategy 
of [1] and compare the results from the differential equations with simulations. We also demonstrate how the 
equations give more insight into the behavior of GREEDY and how the equations relate to the work in [1]. 

2 The Differential Equations 

In this section, we demonstrate how to establish a family of differential equations that can be used to model the 
behavior of the GREEDY strategy of [1]. Recall that we begin with m balls and n bins. Balls arrive sequentially, 
and upon arrival, each ball chooses d bins independently and uniformly at random (with replacement); the ball 
is then placed in the least loaded of these bins (with ties broken arbitrarily). 

We first ask how many bins remain empty after the protocol GREEDY (d) terminates. This question has a natural 
interpretation in the task-processor model: how many of our processors are not utilized? The question can also 
be seen as a matching problem on random bipartite graphs: given a bipartite graph with n vertices on each side 
such that each vertex on the left has d edges to vertices chosen independently and uniformly at random on the 
right, what is the expected size of the greedy matching obtained by sequentially matching vertices on the left 
to a random unmatched neighbor? Our attack, again, is to consider this system as n -> oo. This question has 
been previously solved by Hajek using entirely similar techniques [9]. We shall begin by briefly repeating his 
argument with some additional insights. Once we show how to answer the question of the number of empty 
bins, we shall extend it to the more general load balancing problem. 
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2.1 The empty bins problem 



We set up the problem of the number of empty bins by developing a Markov chain with a simple state that 
describes the balls ans bins process. We first establish a concept of time. Let Y(T) be the number of non-empty 
bins after T balls have been thrown. Then {Y(i)}, i — 0 . . . m, is clearly a Markov chain. Moreover 

E[Y(T + 1) - Y(T)] = 1 - (pp-^j , 

since the probability that a ball finds all non-empty bins among its d choices is {Y(T)/n) d . 

The notation becomes somewhat more convenient if we scale by a factor of n. We let t be the time at which 
exactly nt balls have been thrown, and we let y{t) be the fraction of non-empty bins. The the claim is the 
random process described above is well approximated by the the trajectory of the differential equation 

where this equation has been obtained by using for the limiting value dy the expected change in y. That the 
random process given by the Markov chain closely follows the trajectory given by the differential equation 
follows easily from known techniques, such as for example Kurtz's Theorem, or the similar work on random 
graphs by Wormald. (As previously mentioned, the balls and bins process has a natural interpretation in terms 
of random bipartite graphs.) This connection with the differential equation yields the following theorem: 



Theorem 1 Suppose cn balls are thrown into n bins according to the GREEDYf d ) protocol for some constant 
c. Let Y cn be the number of non-empty bins when the process terminates. Then limbec Ef^ 2 -] — y c , where 
y c < 1 satisfies 



= Y — 

f-Z (id 



yi d+1 



Proof: The preconditions for Kurtz's theorem (see [18, Theorem 5.3] or [14, Chapter 8]) are easily checked 
for the one-dimensional system described by (1), so by Kurtz's theorem we have that this differential equation 
is the correct limiting process. 1 Instead of solving (1) for y in terms of t, we solve for t in terms of y: = 
= XlSo y ld ■ W e integrate up to some time t, yielding 

00 / fV '<i+l 



From equation (2), given d we can solve for y(t) for any value of t using for example binary search. One can 
also attempt to find an equation for y in terms of d and t ; standard integral tables give such equations when 
d — 2, 3 and 4, for example. When t = c, all of the balls have been thrown, and the process terminates. 

'It appears that there might be a problem here since we consider events occurring at discrete time steps, instead of according to 
random times from a Poisson process. One can always adopt the convention that each discrete time step corresponds to an amount of 
time given by an exponentially distributed random variable. In the limiting case, this distinction disappears. 
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Plugging t = c into equation (2) yields the theorem, with y c — y(c). 



We may actually use Kurtz's Theorem to obtain a concentration result (again, see [18, Theorem 5.3] or [14, 
Chapter 8]). 

Theorem 2 In the notation of Theorem 1, \^- — y c \ is O ^/^p^ with high probability, where the constant 
depends on c. 

One can also obtain entirely similar bounds using Y cn more straightforward martingale arguments. In the 
following, we assume familiarity with basic martingale theory; see, for example, [4, Chapter 7] for more 
information. We use the following form of the martingale tail inequality due to Azuma [5]: 

Lemma 3 [Azuma] Let Xq,X\, ... X m be a martingale sequence such that for each k, 

< 1. 

Then for any a > 0, 

Pr(|X m - X 0 | > a-s/m) < 2e~ Q ' 2/2 . 
Theorem 4 In the notation of Theorem I, Pr(|F c „ — E[T C „]| > a Jen) < 2t~ a ' '' /2 for any a > 0. 

Proof: For 0 < j < cn, let Tj be the a -field of events corresponding to the possible states after j balls have 
been placed, and Zj = E[Y cn \ Tj] be the associated conditional expectation of Y cn . Then the random variables 
{ZjYj=Q form a Doob martingale, and it is clear that \Zj — Zj_i \ < 1. The theorem now follows from Lemma 3. 



Theorem 4 implies that Y cn is within 0(y/n log«) of its expected value with high probability; however, the 
martingale approach does not immediately lead us to the value to which Y cn /n converges. This is a standard 
limitation of the martingale approach: in contrast, the differential equations approach allows us to find the mean 
as well as prove concentration around the mean. 



2.2 Bins with fixed load 



We can extend the previous analysis to find the fraction of bins with load k for any constant k as n oo. We 
first establish the appropriate Markov chain. Let s ; (f) be the fraction of bins with load at least i at time t, where 
again at time t exactly nt balls have been thrown. Then the corresponding differential equations regarding the 
growth of the 57 (for i > 1) are easily determined: 

dsj 

(3) 



-± = (sU-sf) for i>l; 



so 



= 1. 
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a — Z 
r IcUlCLlun 


i million 
oiniuidLion 


j a 

U — J 

r IcUlCLlOIl 


i minion 
oimuidLion 


Si 


U. /616 


U. /olo 


A OOT 1 




S2 


0.2295 


0.2295 


0.1765 


0.1765 


S3 


0.0089 


0.0089 


0.00051 


0.00051 


S 4 


0.000006 


0.000007 


< lO" 11 


0 


S5 


< lO" 11 


0 


< lO" 11 


0 



Table 1: Predicted behavior for GREEDY (d) and average results from 100 simulations with 1 million balls. 

The differential equations have the following simple interpretation: for there to be an increase in the number of 
bins with at least i balls, the d choices of a ball about to be placed must all be bins with load at least i — 1, but 
not all bins with load at least i. 

In contrast to Section 2.1, where we could derive a formula for the fraction of empty bins, we are not aware of 
how to determine explicit formulae for Sj(t) in general. These systems of differential equations can be solved 
numerically using standard methods, however; for up to any fixed k and t, we can accurately determine Sk(t). 
By applying Kurtz's theorem (as in Theorem 2) or martingale arguments (as in Theorem 4) one can show that 
these results will be accurate with high probability. 

We also demonstrate that our technique accurately predicts the behavior of the GREEDY (a?) algorithm by com- 
paring with simulation results. The first and third columns of Table 1 shows the values of 57 (1) for d — 2 and 
d — 3 as caclulated from the differential equations. We use these values as predictions for the process where 
we throw n balls into n bins. From the predictions with d — 2, one would not expect to see bins with load 
five until billions of balls have been thrown. Similarly, choosing d — 3 one expects a maximum load of three 
until billions of balls have been thrown. These results match our simulations of several hundred runs with up 
to thirty-two million balls, the largest simulation we have attempted. We also present the averages from one 
hundred simulations of one million balls for d — 2 and d = 3, which demonstrate the accuracy of the technique 
in predicting the behavior of the system. Further simulations reveal that, in general, the solution given by the 
limiting system of differential equations becomes more accurate as n grows, and the deviation from this solu- 
tion is small, as one would expect. This accuracy is a marked advantage of this approach; previous techniques 
have not provided ways of concretely predicting actual performance. 

2.3 Relationship to O (log log n) bounds 

We can also use the differential equations to provide an alternative derivation of a key idea from the proof of 
the upper bounds on the maximum load of GREEDY (d). The approach of looking at the underlying differential 
equations provides insight into how the st decrease, which is essential to determining the 0(loglog?i) bounds. 

We begin by focusing on the case where the number of balls n equals the number of bins n, and consider the 
limiting description given by the differential equations as n -> 00. 

Lemma 5 For the family of differential equations given by (3), Sj(l) < 
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Proof: We wish to know the values of s*(l). Because the 57 are all non-decreasing over time and non-negative, 
from (3) 

for all t < 1 and hence by integrating 

Si{\)<[Si-i{\)f. (4) 

■ 

One can calculate si(l) directly from Theorem 1, and it follows from a simple induction on (4) that 

Si(l) < bi(l)f _1 . 

In other words, the s ; (l), which represent the limiting fraction of bins with load at least i after all balls have 
been thrown, decrease doubly exponentially, as the i is in the second level of the exponent. Using Kurtz's 
theorem one obtains a high probability result for the case of a finite number of balls n. 

Theorem 6 Let w" be the fraction of bins with load at least i when n balls are thrown into n bins using 
GREEDY(J). Then for any e > Oandfvcedi, w" < (1 + e)[si(\)] d '~ [ with probability at least 1 - e c(log,) " e2 / or 
some suitable constant c. 

Proof: The proof is a direct application of Kurtz's Theorem, with the error bounds as given in [18, Theorem 
5.3]. ■ 

This doubly exponential decrease in the s;(l) (or, equivalently, of the w") is a key step of the proof of Azar et 
al in [1], where it is proven via an inductive use of Chernoff bounds. Theorem 6 shows that this induction can 
be replaced by applying Kurtz's theorem to the differential equations, at least up to any fixed constant value 
of i. This approach has some advantages. Most importantly, Lemma 5 and the corresponding inductive bound 
for Si (I) seem quite natural and make transparent how the 57 decrease. Additionally, using this approach can 
improve the additive 0(1) term in Theorem 4 of [1]. 

Intuitively, this doubly exponential decrease suggests that if we look at bins with load at least i * = { ™p + y ] , 
where / — (log2 — loglog(l/5 , i(l)))/log<i, then then 

st*+i < [Ji(i)r < -5, 

n z 

and hence with high probability there will be no bins with load at least i* + 1. 

Note, however, that as stated in Theorem 6 the high probability bounds only hold up to any fixed i, and not 
for values of i up to f2(loglog«). This weakness in the result stems from directly applying the differential 
equations approach and Kurtz's theorem, which requires the underlying state space to be finite dimensional, 
and hence loads up to only some fixed constant can be considered. By reverting back to the explicit martingale 
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argument that underlies Kurtz's Theorem, we can circumvent this restriction somewhat (up until the point where 
Si is 1/ n 1 / 2+s for any 8 > 0), but at some point when 57 is sufficiently small it seems we have to explicitly handle 
this case directly, as is done in Theorem 4 of [1]. We omit details of these more extensive arguments, since their 
form would be almost entirely a restatement of Theorem 4 of [1], replacing their use of Chernoff bounds with 
an equivalent martingale argument. A general framework that would allow us to apply the differential equations 
in this instance in a more straightforward manner would clearly be appealing, since the differential equations 
do make clear the behavior of the 57 . 

In the case of m balls and n bins, a similar argument to Lemma 5 and Theorem 6 holds when m = cn for 
some constant c. As in Section 2.1, when m — cn, the infinite process runs until time c; if m is not a linear 
function of n, the time until the process terminates is dependent on n, and Kurtz's theorem cannot be applied. 
For Lemma 5, the appropriate result becomes 

Si (c) <c[s i - l (c)] d . 

By noting that s c i +c {c) < l/(c + l) (the fraction of bins with load at least x cannot be more than c/x), one may 
again inductively show as before that the tails of the loads are doubly exponentially decreasing. Improvements 
can be made in the constants by using the differential equations to find better starting points than s c 2 +c {c) < 
l/(c + 1) for the induction. 

3 Conclusion 

There are significant advantages to using differential equations to study randomized load balancing problems. 
The insight one gains about the problem and the numerical accuracy one obtains are quite convincing. More- 
over, when the corresponding state spaces are finite dimensional, applying Kurtz's Theorem can yield simple 
proofs of the limiting behavior. A general framework for dealing with spaces that are not necessarily finite 
dimensional would greatly simplify using this approach in developing bounds such as the 0(loglog«) bounds 
of [1]. We expect this approach will find a great deal of further use in the analysis of load balancing schemes, 
as well as other algorithmic areas. 
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